AIDP OpenAI Demo example - demonstrating the NVIDIA AI Data Platform Retrieval API following the OpenAI Vector Store Search specification, exposed via MCP. by PicoNVIDIA · Pull Request #12 · NVIDIA/NeMo-Agent-Toolkit-Examples

PicoNVIDIA · 2026-01-07T15:32:58Z

Description

AIDP OpenAI Demo example, demonstrating the NVIDIA AI Data Platform Retrieval API following the OpenAI Vector Store Search specification, exposed via MCP.

OpenAI-compatible search_vector_store tool via MCP
NVIDIA NIMs for embeddings (nv-embedqa-e5-v5) and LLM reasoning (llama-3.1-70b-instruct)
Milvus vector database integration

Summary by CodeRabbit

New Features
- Added complete AIDP Retrieval API demo project with OpenAI-compatible vector store search functionality
- Includes REST API service and MCP server implementation for semantic search on support tickets
- Added data loading utilities and example queries demonstrating search, filtering, and result ranking capabilities
Documentation
- Added comprehensive README with setup instructions, architecture overview, and usage examples
Chores
- Added project configuration and workflow setup files

_{✏️ Tip: You can customize this high-level summary in your review settings.}

This example demonstrates the NVIDIA AI Data Platform (AIDP) Retrieval API following the OpenAI Vector Store Search specification, exposed via MCP. Features: - OpenAI-compatible search_vector_store tool via MCP - NVIDIA NIMs for embeddings and LLM reasoning - Milvus vector database integration - Synthetic data generation using Nemotron LLM - Full ReAct agent workflow with NeMo Agent Toolkit Signed-off-by: Patrick Moorhead <pmoorhead@nvidia.com>

- rest_api.py: OpenAI-compatible REST endpoint - examples.py: Comprehensive API usage examples - Updated pyproject.toml with fastapi/uvicorn deps - Updated README with complete file list Signed-off-by: Patrick Moorhead <pmoorhead@nvidia.com>

copy-pr-bot · 2026-01-07T15:33:01Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

PicoNVIDIA · 2026-01-07T15:35:10Z

@CodeRabbit review full

coderabbitai · 2026-01-07T15:35:29Z

@PicoNVIDIA: I'll perform a full review of all the changes in this PR.

✅ Actions performed

Full review triggered.

coderabbitai · 2026-01-07T15:44:52Z

Walkthrough

This pull request introduces a complete example project demonstrating AIDP Retrieval API with OpenAI compatibility. It includes project configuration, a data loading script, FastAPI REST service, MCP server implementation, workflow configuration, and example usage patterns that integrate Milvus vector database with NVIDIA embedding APIs.

Changes

Cohort / File(s)	Summary
Project Configuration & Documentation `examples/aidp_openai_demo/README.md`, `examples/aidp_openai_demo/pyproject.toml`, `examples/aidp_openai_demo/src/nat_aidp_openai_demo/__init__.py`	Adds README with end-to-end setup and usage guidance; project metadata with dependencies (nvidia-nat, pymilvus, fastapi, uvicorn, requests); SPDX license headers.
Data Loading `examples/aidp_openai_demo/scripts/load_support_tickets.py`	Adds script to seed Milvus with synthetic support tickets via NVIDIA LLM; includes embedding generation, collection creation, fallback ticket generation, and semantic search validation.
REST API Implementation `examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py`	Implements OpenAI-compatible FastAPI service with Pydantic models, Bearer token authentication, Milvus vector search, NVIDIA embeddings integration, filtering, and error handling; exposes `/v1/vector_stores` and `/v1/vector_stores/{vector_store_id}/search` endpoints.
MCP Server Implementation `examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py`	Implements FastMCP HTTP server exposing `search_vector_store` tool; lazy Milvus client initialization, NVIDIA embedding retrieval, OpenAI-compatible response formatting, and comprehensive error handling.
Configuration & Examples `examples/aidp_openai_demo/src/nat_aidp_openai_demo/configs/workflow.yml`, `examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py`	Adds YAML workflow configuration for react agent with Llama 3.1 LLM; examples module demonstrates API usage via five example scenarios (basic search, filtering, category filtering, full response format, OpenAI alignment) with health checks and error handling.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant FastAPI as FastAPI REST Service
    participant Auth as Authentication
    participant Embedding as NVIDIA Embeddings API
    participant Milvus as Milvus Vector DB
    participant Format as Response Formatter

    Client->>FastAPI: POST /v1/vector_stores/{id}/search
    
    FastAPI->>Auth: verify_api_key(Bearer token)
    Auth-->>FastAPI: ✓ API Key Valid
    
    rect rgb(200, 220, 240)
        Note over FastAPI,Milvus: Vector Search Flow
        FastAPI->>Embedding: GET embedding for query text
        Embedding-->>FastAPI: embedding vector
        
        FastAPI->>Milvus: search(vector, limit, filters)
        Milvus-->>FastAPI: hits with distances & metadata
    end
    
    rect rgb(220, 240, 220)
        Note over FastAPI,Format: Response Formatting
        FastAPI->>Format: Apply score_threshold & filters
        Format->>Format: Convert distances to similarity scores
        Format-->>FastAPI: Filtered results list
    end
    
    FastAPI-->>Client: SearchResponse (OpenAI-compatible JSON)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and comprehensively describes the main change: adding an AIDP OpenAI Demo example that demonstrates the NVIDIA AI Data Platform Retrieval API following the OpenAI Vector Store Search specification and exposed via MCP. This aligns perfectly with the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 95.24% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In @examples/aidp_openai_demo/scripts/load_support_tickets.py:
- Around line 44-61: The generate_embeddings function calls requests.post
without a timeout which can hang; update the call to include a reasonable
timeout (e.g., timeout=10) when posting to NVIDIA_EMBED_URL with headers from
get_nvidia_headers and json=payload, and wrap the request in a try/except to
catch requests.Timeout (and requests.RequestException) so you can log or
re-raise a clear error; ensure the modified code still calls
response.raise_for_status() and processes result["data"] into embeddings as
before.
- Around line 97-106: The LLM API call using requests.post(NVIDIA_LLM_URL,
headers=headers, json=payload) lacks a timeout and can hang; update the call in
load_support_tickets.py to pass a timeout parameter (e.g., timeout=120) to
requests.post and keep response.raise_for_status() as-is so long-running
generation is allowed but bounded; ensure you modify the line that creates the
response (the requests.post invocation) and test that timeouts propagate
cleanly.

In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py:
- Around line 286-293: The current CORS setup uses app.add_middleware with
CORSMiddleware and allow_origins=["*"] together with allow_credentials=True
which permits credentialed requests from any origin; change this for production
by replacing the wildcard with a configurable list of trusted origins (e.g.,
read ALLOWED_ORIGINS from an environment variable), then pass that list to
app.add_middleware (keeping allow_credentials=True if needed) so only explicitly
allowed origins are permitted; update any related configuration or startup code
to parse the ALLOWED_ORIGINS env var (split on commas) and use that variable in
the CORSMiddleware call.

In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py:
- Around line 139-145: The Milvus search call (client.search) can hang
indefinitely; update the search invocation that uses vector_store_id,
query_embedding and max_num_results to pass a per-operation timeout (e.g.,
timeout=30.0) so the call fails fast on unresponsive Milvus; add the timeout
parameter to the client.search(...) argument list and choose an appropriate
value consistent with other timeouts in the codebase.

🧹 Nitpick comments (11)

examples/aidp_openai_demo/README.md (1)
81-103: Consider minor markdown formatting improvements.

The documentation is comprehensive and well-structured. For improved markdown compliance and accessibility:

Add language specifiers to fenced code blocks (lines 81, 141, 230) - helps with syntax highlighting

Wrap the bare URL at line 175 in angle brackets: <http://localhost:3000>

Line 177: Consider using a proper heading (###) instead of bold text for "Alternative: Command Line"

These changes improve document accessibility and comply with markdown best practices.
📝 Suggested markdown improvements

Line 81 - Add language identifier:
-```
+```text
 ┌─────────────┐         REST          ┌─────────────────┐
Line 141 - Add language identifier:
-```
+```text
 Creating collection: support_tickets with explicit schema
Line 175 - Wrap bare URL:
-Navigate to: http://localhost:3000
+Navigate to: <http://localhost:3000>
Line 177 - Use proper heading:
-**Alternative: Command Line**
+### Alternative: Command Line
Line 230 - Add language identifier:
-```
+```http
 POST /v1/vector_stores/{vector_store_id}/search
Also applies to: 141-146, 175-175, 230-232
examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py (3)
51-55: Consider binding restrictions for production deployments.

The server binds to all network interfaces (host="0.0.0.0"), making it accessible from any network the host is connected to. For a demo, this may be intentional for flexibility, but consider:

For production: bind to localhost or specific interface

Document the security implications in deployment guidance

Consider adding authentication/authorization if exposing externally

159-167: Improve filter parameter validation.

The filter logic requires ALL three parameters (filter_key, filter_type, filter_value) to be present. If a user provides only some of these parameters, they are silently ignored, which could be confusing.

Consider adding explicit validation:
🔍 Proposed improvement for filter validation
                # Apply filters
-               if filter_key and filter_type and filter_value:
+               # Validate filter parameters - all or none must be provided
+               filter_params_provided = [filter_key, filter_type, filter_value]
+               filter_params_count = sum(1 for p in filter_params_provided if p is not None)
+               
+               if filter_params_count > 0 and filter_params_count < 3:
+                   # Partial filter parameters provided - this is likely an error
+                   logger.warning(f"Partial filter parameters provided (count={filter_params_count}). All three (filter_key, filter_type, filter_value) are required for filtering.")
+               
+               if filter_params_count == 3:
                    attr_value = entity.get(filter_key, "")
                    if filter_type == "eq" and attr_value != filter_value:
                        continue
                    elif filter_type == "ne" and attr_value == filter_value:
                        continue
                    elif filter_type == "contains" and filter_value not in str(attr_value):
                        continue
200-204: Improve exception logging to include traceback.

The exception handler should use logging.exception() instead of logging.error() to automatically include the full traceback, which is valuable for debugging.
🔧 Proposed fix
    except Exception as e:
-       logger.error(f"Search error: {e}")
+       logger.exception(f"Search error: {e}")
        return json.dumps({
            "error": {"code": 500, "message": str(e)}
        })
examples/aidp_openai_demo/scripts/load_support_tickets.py (3)
1-4: Shebang placement issue.

The shebang (#!/usr/bin/env python3) should be at the very beginning of the file, before the SPDX license headers, to ensure the script is executable.
Suggested fix
+#!/usr/bin/env python3
 # SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
-
-#!/usr/bin/env python3
205-207: Remove redundant import.

DataType is already imported at line 18; the re-import here is unnecessary.
Proposed fix
 def create_collection(client: MilvusClient):
     """Create the support_tickets collection in Milvus with explicit schema."""
-    from pymilvus import DataType
-    
278-290: Add strict=True to zip() for safety.

Using zip(..., strict=True) would catch potential mismatches between tickets and embeddings list lengths, which could indicate a bug. Also, the loop variable i is unused.
Proposed fix
     import uuid
     data = []
-    for i, (ticket, embedding) in enumerate(zip(tickets, embeddings)):
+    for ticket, embedding in zip(tickets, embeddings, strict=True):
         data.append({
             "pk": str(uuid.uuid4()),  # String primary key
             "vector": embedding,
             "title": ticket["title"][:256],  # Truncate to fit schema
             "text": f"{ticket['title']}. {ticket['content']}"[:4096],  # Use 'text' field for retriever
             "category": ticket.get("category", "General")[:64],
             "severity": ticket.get("severity", "medium")[:32],
             "status": ticket.get("status", "open")[:32]
         })
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (2)
1-4: Shebang placement issue.

Same as in the other file—move the shebang to line 1 for proper script execution.

385-394: Improve error handling and logging.

The exception handling can be improved:

Use logger.exception() to capture the full traceback

Chain the exception with from e to preserve context for debugging
Proposed fix
     try:
         return retrieval_service.search(vector_store_id, request)
     except HTTPException:
         raise
     except Exception as e:
-        logger.error(f"Search error: {e}")
+        logger.exception("Search error")
         raise HTTPException(
             status_code=500,
-            detail=f"Internal system error: {str(e)}"
-        )
+            detail=f"Internal system error: {e!s}"
+        ) from e
examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py (2)
1-4: Shebang placement issue.

Same fix needed—move shebang to line 1.

333-350: Simplify health check logic.

The exception handling can be cleaner. Since response.raise_for_status() isn't called, a non-200 status doesn't raise an exception, making the manual check necessary. Consider simplifying:
Proposed fix
     try:
         # Check if API is available
         response = requests.get(f"{AIDP_API_BASE}/", timeout=5)
-        if response.status_code != 200:
-            raise Exception("API not available")
-        
+        response.raise_for_status()
         print(f"✓ AIDP API Server running at {AIDP_API_BASE}")
         
-    except Exception as e:
+    except requests.exceptions.RequestException:
         print(f"""
 ✗ AIDP API Server not running at {AIDP_API_BASE}

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7406837 and 022d366.

📒 Files selected for processing (8)

examples/aidp_openai_demo/README.md
examples/aidp_openai_demo/pyproject.toml
examples/aidp_openai_demo/scripts/load_support_tickets.py
examples/aidp_openai_demo/src/nat_aidp_openai_demo/__init__.py
examples/aidp_openai_demo/src/nat_aidp_openai_demo/configs/workflow.yml
examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py
examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py

🧰 Additional context used

🧬 Code graph analysis (3)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py (1)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (1)

search (169-259)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (1)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py (1)

search_vector_store (79-130)

examples/aidp_openai_demo/scripts/load_support_tickets.py (1)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (1)

search (169-259)

🪛 Gitleaks (8.30.0)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py

[high] 413-415: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.

(curl-auth-header)

🪛 markdownlint-cli2 (0.18.1)

examples/aidp_openai_demo/README.md

81-81: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

141-141: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

175-175: Bare URL used

(MD034, no-bare-urls)

177-177: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

230-230: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.10)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py

4-4: Shebang should be at the beginning of the file

(EXE005)

53-53: Possible binding to all interfaces

(S104)

73-73: Avoid specifying long messages outside the exception class

(TRY003)

200-200: Do not catch blind exception: Exception

(BLE001)

201-201: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py

4-4: Shebang should be at the beginning of the file

(EXE005)

149-149: Avoid specifying long messages outside the exception class

(TRY003)

271-271: Unused function argument: app

(ARG001)

345-345: Unused function argument: api_key

(ARG001)

373-373: Unused function argument: api_key

(ARG001)

389-389: Do not catch blind exception: Exception

(BLE001)

390-390: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

391-394: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

393-393: Use explicit conversion flag

Replace with conversion flag

(RUF010)

419-419: Possible binding to all interfaces

(S104)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py

4-4: Shebang should be at the beginning of the file

(EXE005)

337-337: Abstract raise to an inner function

(TRY301)

337-337: Create your own exception

(TRY002)

337-337: Avoid specifying long messages outside the exception class

(TRY003)

341-341: Do not catch blind exception: Exception

(BLE001)

341-341: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

examples/aidp_openai_demo/scripts/load_support_tickets.py

4-4: Shebang should be at the beginning of the file

(EXE005)

36-37: Avoid specifying long messages outside the exception class

(TRY003)

56-56: Probable use of requests call without timeout

(S113)

105-105: Probable use of requests call without timeout

(S113)

121-121: Consider moving this statement to an else block

(TRY300)

207-207: Redefinition of unused DataType from line 18

Remove definition: DataType

(F811)

280-280: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

280-280: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

346-346: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (17)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/__init__.py (1)

1-3: LGTM!

Appropriate SPDX license headers for the package initialization file.

examples/aidp_openai_demo/src/nat_aidp_openai_demo/configs/workflow.yml (1)

18-43: LGTM!

The workflow configuration is well-structured:

MCP client correctly points to the server on port 8081 (matches server.py)

LLM configuration uses appropriate NVIDIA NIM model with conservative temperature (0) for consistent reasoning

Agent setup includes sensible retry logic (max_retries: 2, retry_parsing_errors: true)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py (2)

70-91: LGTM! Good timeout and error handling.

The embedding function includes:

Appropriate 30-second timeout for the external API call

Proper authentication header handling

Error propagation via raise_for_status() for HTTP errors

Network errors (connection issues, timeouts) will propagate to the caller's exception handler

207-221: LGTM!

The entry point provides clear server information and correctly starts the MCP server with the streamable-http transport. The banner helpfully displays the endpoint URL, tool name, and Milvus connection details.

examples/aidp_openai_demo/pyproject.toml (1)

11-18: No issues found with dependency versions.

All specified dependencies are available on PyPI and appropriately constrained. The nvidia-nat[langchain,mcp]>=1.4.0a0,<1.5.0 constraint matches multiple actively maintained alpha, beta, and RC builds (the specific version 1.4.0a0 does not exist, but the constraint correctly captures all qualifying versions from 1.4.0a20251008 onward). The pymilvus~=2.6 constraint is satisfied by stable releases 2.6.0 through 2.6.6. For an example project, using alpha versions is appropriate and expected.

examples/aidp_openai_demo/scripts/load_support_tickets.py (3)

343-356: LGTM with note on exception handling.

The broad Exception catch on line 346 is intentional to gracefully fall back to static tickets when LLM generation fails for any reason. This is appropriate for a demo/seeding script.

129-202: LGTM!

Good fallback data with realistic support ticket examples covering various NVIDIA product categories.

238-266: LGTM!

Random vector generation with proper normalization for demo mode is appropriate.

examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (6)

300-326: LGTM for demo purposes.

The authentication correctly enforces Bearer token format. The comment on lines 317-318 appropriately documents this is demo-only validation. The api_key parameters in endpoints are not unused—they trigger the verify_api_key dependency for authentication enforcement.

146-167: LGTM!

Good implementation with proper timeout (30s) on the external API call and appropriate error handling.

169-259: LGTM with optimization note.

The search implementation is correct. The post-filtering approach (lines 214-230) works but could be optimized by pushing filters to Milvus using filter parameter in the search call for better performance with large datasets. Current approach is fine for demo/small scale.

71-127: LGTM!

Pydantic models are well-structured with appropriate validation constraints and follow the OpenAI API specification.

401-419: LGTM!

Binding to 0.0.0.0 is appropriate for containerized deployment. The Gitleaks warning about the curl example is a false positive—it's documentation showing usage patterns, not an actual secret.

270-277: LGTM!

The app parameter is required by FastAPI's lifespan context manager protocol—the static analysis warning is a false positive.

examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py (3)

79-130: LGTM!

Clean implementation with proper timeout handling and correct OpenAI-compatible request format.

133-311: LGTM!

Well-structured examples that clearly demonstrate the API usage patterns and OpenAI alignment.

60-62: LGTM!

Good use of environment variables with sensible defaults for demo purposes.

coderabbitai · 2026-01-07T15:44:55Z

examples/aidp_openai_demo/scripts/load_support_tickets.py

+def generate_embeddings(texts: List[str]) -> List[List[float]]:
+    """Generate embeddings using NVIDIA NIM."""
+    headers = get_nvidia_headers()
+
+    payload = {
+        "input": texts,
+        "model": EMBEDDING_MODEL,
+        "input_type": "passage",
+        "encoding_format": "float",
+        "truncate": "END"
+    }
+
+    response = requests.post(NVIDIA_EMBED_URL, headers=headers, json=payload)
+    response.raise_for_status()
+
+    result = response.json()
+    embeddings = [item["embedding"] for item in result["data"]]
+    return embeddings


⚠️ Potential issue | 🟠 Major

Add timeout to external API call.

The requests.post() call lacks a timeout, which could cause the script to hang indefinitely if the NVIDIA API is unresponsive. This is flagged by static analysis (S113).

Proposed fix

- response = requests.post(NVIDIA_EMBED_URL, headers=headers, json=payload) + response = requests.post(NVIDIA_EMBED_URL, headers=headers, json=payload, timeout=60)

🧰 Tools

🪛 Ruff (0.14.10)

56-56: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents

In @examples/aidp_openai_demo/scripts/load_support_tickets.py around lines 44 - 61, The generate_embeddings function calls requests.post without a timeout which can hang; update the call to include a reasonable timeout (e.g., timeout=10) when posting to NVIDIA_EMBED_URL with headers from get_nvidia_headers and json=payload, and wrap the request in a try/except to catch requests.Timeout (and requests.RequestException) so you can log or re-raise a clear error; ensure the modified code still calls response.raise_for_status() and processes result["data"] into embeddings as before.

coderabbitai · 2026-01-07T15:44:55Z

examples/aidp_openai_demo/scripts/load_support_tickets.py

+    payload = {
+        "model": LLM_MODEL,
+        "messages": [{"role": "user", "content": prompt}],
+        "temperature": 0.8,
+        "max_tokens": 8000
+    }
+
+    print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...")
+    response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload)
+    response.raise_for_status()


⚠️ Potential issue | 🟠 Major

Add timeout to LLM API call.

Similar to the embeddings call, this requests.post() to the LLM endpoint should include a timeout to prevent indefinite hangs. Given that LLM generation may take longer, a higher timeout (e.g., 120s) is appropriate.

Proposed fix

payload = { "model": LLM_MODEL, "messages": [{"role": "user", "content": prompt}], "temperature": 0.8, "max_tokens": 8000 } print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...") - response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload) + response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload, timeout=120)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

payload = {

"model": LLM_MODEL,

"messages": [{"role": "user", "content": prompt}],

"temperature": 0.8,

"max_tokens": 8000

}

print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...")

response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload)

response.raise_for_status()

payload = {

"model": LLM_MODEL,

"messages": [{"role": "user", "content": prompt}],

"temperature": 0.8,

"max_tokens": 8000

}

print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...")

response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload, timeout=120)

response.raise_for_status()

🧰 Tools

🪛 Ruff (0.14.10)

105-105: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents

In @examples/aidp_openai_demo/scripts/load_support_tickets.py around lines 97 - 106, The LLM API call using requests.post(NVIDIA_LLM_URL, headers=headers, json=payload) lacks a timeout and can hang; update the call in load_support_tickets.py to pass a timeout parameter (e.g., timeout=120) to requests.post and keep response.raise_for_status() as-is so long-running generation is allowed but bounded; ensure you modify the line that creates the response (the requests.post invocation) and test that timeouts propagate cleanly.

coderabbitai · 2026-01-07T15:44:55Z

examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py

+# Add CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)


⚠️ Potential issue | 🟡 Minor

Review CORS configuration for production use.

Using allow_origins=["*"] with allow_credentials=True allows any origin to make credentialed requests. While acceptable for local development/demo, this should be restricted in production to specific trusted origins.

Suggested approach for production

# For production, specify allowed origins explicitly ALLOWED_ORIGINS = os.getenv("ALLOWED_ORIGINS", "http://localhost:3000").split(",") app.add_middleware( CORSMiddleware, allow_origins=ALLOWED_ORIGINS, allow_credentials=True, allow_methods=["*"], allow_headers=["*"], )

🤖 Prompt for AI Agents

In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py around lines 286 - 293, The current CORS setup uses app.add_middleware with CORSMiddleware and allow_origins=["*"] together with allow_credentials=True which permits credentialed requests from any origin; change this for production by replacing the wildcard with a configurable list of trusted origins (e.g., read ALLOWED_ORIGINS from an environment variable), then pass that list to app.add_middleware (keeping allow_credentials=True if needed) so only explicitly allowed origins are permitted; update any related configuration or startup code to parse the ALLOWED_ORIGINS env var (split on commas) and use that variable in the CORSMiddleware call.

coderabbitai · 2026-01-07T15:44:56Z

examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py

+        # Search Milvus
+        search_results = client.search(
+            collection_name=vector_store_id,
+            data=[query_embedding],
+            limit=max_num_results,
+            output_fields=["pk", "title", "text", "category", "severity", "status"]
+        )


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py | sed -n '130,155p'

Repository: NVIDIA/NeMo-Agent-Toolkit-Examples

Length of output: 1180

🌐 Web query:

pymilvus MilvusClient timeout configuration search operations

💡 Result:

Short answer: set timeout per-operation by passing timeout to search(), or set a client connection timeout in MilvusClient(...) (which is only for establishing the connection, not a default for all RPCs). See docs.

Examples

Per-search timeout:
client.search(collection_name="mycol", data=vectors, limit=10, timeout=5.0). [1]

Client constructor timeout (connection timeout only):
client = MilvusClient(uri="http://localhost:19530", timeout=10) # used for establishing connection. [2][3]

Notes

Most MilvusClient methods (search, query, load_collection, list_collections, etc.) accept a timeout: Optional[float] parameter that controls that RPC's deadline (None = wait until response/error). [1][3]

Sources
[1] pymilvus search() API reference. (milvus.io)
[2] Connect/Configure a timeout (client connection) in Milvus docs. (blog.milvus.io)
[3] MilvusClient(...) constructor / timeout parameter. (milvus.io)

Citations:

1: https://milvus.io/api-reference/pymilvus/v2.5.x/MilvusClient/Vector/search.md?utm_source=openai

2: https://blog.milvus.io/docs/connect-to-milvus-server.md?utm_source=openai

3: https://milvus.io/api-reference/pymilvus/v2.5.x/MilvusClient/Client/MilvusClient.md?utm_source=openai

🏁 Script executed:

# Search for timeout patterns in the codebase rg -i "timeout" examples/aidp_openai_demo/ -B 2 -A 2

Repository: NVIDIA/NeMo-Agent-Toolkit-Examples

Length of output: 1818

🏁 Script executed:

# Check imports and client initialization cat -n examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py | sed -n '1,50p'

Repository: NVIDIA/NeMo-Agent-Toolkit-Examples

Length of output: 1752

Add timeout parameter to Milvus search operation to prevent indefinite hangs.

The client.search() call at lines 140-145 lacks a timeout configuration. If the Milvus server becomes unresponsive, the operation will block indefinitely. The pymilvus library supports per-operation timeouts via the timeout parameter.

Consider adding a timeout to the search operation, for example:

search_results = client.search( collection_name=vector_store_id, data=[query_embedding], limit=max_num_results, output_fields=["pk", "title", "text", "category", "severity", "status"], timeout=30.0 )

This aligns with the timeout patterns already used elsewhere in the codebase (e.g., 30s for API calls). Adjust the timeout value based on expected Milvus response time for your dataset and hardware.

🤖 Prompt for AI Agents

In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py around lines 139 - 145, The Milvus search call (client.search) can hang indefinitely; update the search invocation that uses vector_store_id, query_embedding and max_num_results to pass a per-operation timeout (e.g., timeout=30.0) so the call fails fast on unresponsive Milvus; add the timeout parameter to the client.search(...) argument list and choose an appropriate value consistent with other timeouts in the codebase.

PicoNVIDIA · 2026-01-07T15:47:48Z

@AnuradhaKaruppiah, please review when you have time.

Thanks for the help.

AnuradhaKaruppiah · 2026-01-12T19:59:18Z

/ok to test 022d366

willkill07 · 2026-01-27T13:55:35Z

examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py

Should this live under src/ if it's intended to be run by the user?

willkill07 · 2026-01-27T13:59:43Z

examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py

Why not expose these tools through NAT? nat mcp serve --config_file <>

This is non-blocking, but it would be nice to see tighter integration.

I also recommend opening an Issue+PR on NeMo Agent Toolkit that dynamically adds vector store endpoints to the FastAPI front-end. /v1/vector_stores/{vector_store_id}/search

We can add an optional list of RetrieverRefs to the FastAPI configuration :)

Some partners do not want to use NAT, only the server spec, I used NAT as a reference and recommended it if they didn't already have one.

I understand that, but this is also a NeMo Agent Toolkit Examples repository.

willkill07 · 2026-01-27T14:01:03Z

examples/aidp_openai_demo/pyproject.toml

+requires-python = ">=3.11"
+
+dependencies = [
+    "nvidia-nat[langchain,mcp]>=1.4.0a0,<1.5.0",


We use semantic versioning as best as possible. So what you write in now should work until 2.0.0

Suggested change

"nvidia-nat[langchain,mcp]>=1.4.0a0,<1.5.0",

"nvidia-nat[langchain,mcp]>=1.4.0a0,<2.0.0",

willkill07 · 2026-01-27T14:01:51Z

examples/aidp_openai_demo/pyproject.toml

+    "fastmcp",
+    "fastapi",
+    "uvicorn",
+    "requests",


Leaving all of these unversioned seems odd. Maybe add a comment that says # versions determined by NeMo Agent Toolkit

willkill07 · 2026-01-27T14:02:06Z

examples/aidp_openai_demo/pyproject.toml

+version = "0.1.0"
+description = "AIDP Retrieval API Demo with NVIDIA NIMs - OpenAI Vector Store Search specification via MCP"
+readme = "README.md"
+requires-python = ">=3.11"


Suggested change

requires-python = ">=3.11"

requires-python = ">=3.11,<3.14"

willkill07 · 2026-01-27T14:03:16Z

examples/aidp_openai_demo/README.md

+
+```bash
+export NVIDIA_API_KEY=<YOUR_API_KEY>
+nat serve --config_file src/nat_aidp_openai_demo/configs/workflow.yml --port 8000


It is normal for examples to have a symbolic link from configs -> src/<package>/configs

That way you can have:

--config_file configs/workflow.yml

willkill07 · 2026-03-19T00:45:16Z

@PicoNVIDIA could you please address the review feedback?

PicoNVIDIA added 2 commits January 7, 2026 09:17

Add REST API server and examples

022d366

- rest_api.py: OpenAI-compatible REST endpoint - examples.py: Comprehensive API usage examples - Updated pyproject.toml with fastapi/uvicorn deps - Updated README with complete file list Signed-off-by: Patrick Moorhead <pmoorhead@nvidia.com>

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

AnuradhaKaruppiah assigned PicoNVIDIA Jan 20, 2026

willkill07 added feature request New feature or request non-breaking Non-breaking change labels Jan 27, 2026

willkill07 self-assigned this Jan 27, 2026

willkill07 reviewed Jan 27, 2026

View reviewed changes

	"nvidia-nat[langchain,mcp]>=1.4.0a0,<1.5.0",
	"nvidia-nat[langchain,mcp]>=1.4.0a0,<2.0.0",

Conversation

PicoNVIDIA commented Jan 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jan 7, 2026

Uh oh!

PicoNVIDIA commented Jan 7, 2026

Uh oh!

coderabbitai bot commented Jan 7, 2026

Uh oh!

coderabbitai bot commented Jan 7, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

PicoNVIDIA commented Jan 7, 2026

Uh oh!

AnuradhaKaruppiah commented Jan 12, 2026

Uh oh!

willkill07 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

willkill07 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

PicoNVIDIA Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

willkill07 Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

willkill07 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

willkill07 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

willkill07 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

willkill07 Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

willkill07 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PicoNVIDIA commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

willkill07 Jan 27, 2026 •

edited

Loading